node identifier
p dH, (7) MSA(X)i= HX
We only prove fori as proof forj is analogous. Node identifierP Rn dp is an orthonormal matrix withn rows, and type identifier is a trainable matrix E Rbell(k) de with bell(k) rows Eγ1,...,Eγbell(k), each designated for an order-k We now letwin = [I,0], where I R(d+kdp+de) (d+kdp+de) is an identity matrix and0 R(d+kdp+de) (dT (d+kdp+de)) is a matrix filled with zeros. We now let the type identifiersEγ1,...,Eγbell(k) be radially equispaced unit vectors on any twodimensional subspace (Figure 6). For a given query indexi, let us assume there exists at least one key indexjsuch that(i,j) µ3. Therefore, with Eq. (42), we are simply duplicating each output entryFi = L With batch size 1024 on 8 RTX 3090 GPUs, fine-tuning takes 12hours.
On the Universality of Graph Neural Networks on Large Random Graphs
We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain ``continuous'' models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with unique node identifiers, referred to as Structural GNNs here (SGNNs). We study the convergence of SGNNs to their continuous counterpart (c-SGNNs) in the large random graph limit, under new conditions on the node identifiers. We then show that c-SGNNs are strictly more powerful than c-GNNs in the continuous limit, and prove their universality on several random graph models of interest, including most SBMs and a large class of random geometric graphs. Our results cover both permutation-invariant and permutation-equivariant architectures.
A Appendix
We begin by formally defining multihead self-attention and Transformer. Our definition is equivalent to Vaswani et al. (2017) [68], except we omit layer normalization for simplicity as in [81, 23, 34]. Consequently, each equivalence class γ in Definition 3 is a distinct set of all order-l multi-indices having a specific equality pattern. Now, for each equivalence class, we define the corresponding basis tensor as follows: Definition 4. I. Given a set of features X R Proof of Lemma 1 (Section 3.3) To prove Lemma 1, we need to show that each basis tensor B Here, our key idea is to break down the inclusion test (i, j) µ into equivalent but simpler Boolean tests that can be implemented in self-attention (Eq. To achieve this, we show some supplementary Lemmas.
Towards Invariance to Node Identifiers in Graph Neural Networks
Bechler-Speicher, Maya, Eliasof, Moshe, Schonlieb, Carola-Bibiane, Gilad-Bachrach, Ran, Globerson, Amir
Message-Passing Graph Neural Networks (GNNs) are known to have limited expressive power, due to their message passing structure. One mechanism for circumventing this limitation is to add unique node identifiers (IDs), which break the symmetries that underlie the expressivity limitation. In this work, we highlight a key limitation of the ID framework, and propose an approach for addressing it. We begin by observing that the final output of the GNN should clearly not depend on the specific IDs used. We then show that in practice this does not hold, and thus the learned network does not possess this desired structural property. Such invariance to node IDs may be enforced in several ways, and we discuss their theoretical properties. We then propose a novel regularization method that effectively enforces ID invariance to the network. Extensive evaluations on both real-world and synthetic tasks demonstrate that our approach significantly improves ID invariance and, in turn, often boosts generalization performance.
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
On the Universality of Graph Neural Networks on Large Random Graphs
We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain continuous'' models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with unique node identifiers, referred to as Structural GNNs here (SGNNs).
Pure Transformers are Powerful Graph Learners
Kim, Jinwoo, Nguyen, Tien Dat, Min, Seonwoo, Cho, Sungjun, Lee, Moontae, Lee, Honglak, Hong, Seunghoon
We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at https://github.com/jw9730/tokengt.
benedekrozemberczki/MixHop-and-N-GCN
Recent methods generalize convolutional layers from Euclidean domains to graph-structured data by approximating the eigenbasis of the graph Laplacian. This simplification restricts the model from learning delta operators, the very premise of the graph Laplacian. In this work, we propose a new Graph Convolutional layer which mixes multiple powers of the adjacency matrix, allowing it to learn delta operators. Our layer exhibits the same memory footprint and computational complexity as a GCN. We illustrate the strength of our proposed layer on both synthetic graph datasets, and on several real-world citation graphs, setting the record state-of-the-art on Pubmed.
benedekrozemberczki/ClusterGCN
Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms.
benedekrozemberczki/NGCN
Recent methods generalize convolutional layers from Euclidean domains to graph-structured data by approximating the eigenbasis of the graph Laplacian. This simplification restricts the model from learning delta operators, the very premise of the graph Laplacian. In this work, we propose a new Graph Convolutional layer which mixes multiple powers of the adjacency matrix, allowing it to learn delta operators. Our layer exhibits the same memory footprint and computational complexity as a GCN. We illustrate the strength of our proposed layer on both synthetic graph datasets, and on several real-world citation graphs, setting the record state-of-the-art on Pubmed.